Hello 👋

Date Time Location Topic
Wednesday, 19-02-2025 13.30-16.30 Van Steenis room E0.02A Welcome
Intro to R/RStudio
R Basics
Friday, 21-02-2025 13.30-16.30 Van Steenis room E0.02A Project organisation
Cleaning data
Monday, 24-02-2025 13.30-16.30 Van Steenis room E0.02B Intro to Statistics
EDA
Visualising data
Friday, 28-02-2025 13.30-16.30 Van Steenis room A2.02A (Corrie Bakelzaal) Visualising data
Transforming data
Monday, 03-03-2025 13.30-16.30 Van Steenis room E0.02B Transforming data
Modelling data
Wednesday, 05-03-2025 13.00-17.00 Van Steenis room E0.02B Communicating data

R

Pronounced /’Arrrgh/

GIPHY

Why R?

Because it’s the best!

End of presentation.

Artwork by @allison_horst.

Why R?

R is a free and open source software environment for statistical computing and graphics

There are 20000+ available packages on CRAN

The R community is pretty cool

Header text "R learners" above five friendly monsters holding up signs that together read "we believe in you."

Artwork by @allison_horst.

Why RStudio?

RStudio is an integrated development environment (IDE) specifically for R

It provides a bunch of extra features to make using R a delight!

tidyverse

The tidyverse is a collection of R packages sharing the same data science philosophy

It provides a nice workflow for cleaning, visualising, and transforming data

Aspects of ‘base R’ will also be covered

About the materials

It is not enough to cover all important topics.

It is enough to teach you how to find answers and implement them yourself.

The datasets: Sheep Astragali

Sheep astragulus morphology from Iron Age Eastern Mediterranean.

nmar79. (2023). nmar79/Med_Sheep_Astragals: v0.1 (v0.1). Zenodo. https://doi.org/10.5281/zenodo.10276147

The datasets: Kiwulan Burials

Burial data from northeastern Taiwan ranging from the Iron Age through the European colonization period.

Li-Ying Wang & Ben Marwick, (2021). Compendium of R code and data for “A Bayesian networks approach to infer social changes from burials in northeastern Taiwan during the European colonization period”. Accessed 23 Aug 2021. Online at https://osf.io/xga6n/

Assignment 1: Case study

Assignment 1 consists of finding, importing, cleaning, and exploring/analysing a dataset.

1.1 Find, import, clean

Find a dataset that you want to work with, then use a script to download and clean the data.

If you can’t find a dataset, you can use the full, unmodified version of the workshop data here: https://osf.io/zem9p

1.2 Exploratory data analysis

If the EDA module is included in the workshop, then assignment 1 is extended to include EDA. Plot the distribution of at least two types of variables that you are interested in exploring further.

Create at least one plot with the relationship between two variables, and a summary table with the mean and standard deviation within groupings of a variable.

1.3 Data modelling

If the data modelling module is taught in the workshop, select at least two statistical models to apply to the data.

1.4 Communicating results

Use Quarto to communicate the results from the previous sections of assignment 1. This can be in the form of a report, short manuscript, presentation, or whatever format in Quarto you prefer.

Assignment 2: Reproduce someone else’s code

You will be paired up with another person in the workshop. Send each other your project and try to run the other’s code.

Make a note of any issues you encounter, and provide feedback in a document.

Incorporate feedback.